Model Selection

Multimodal fusion

# Multimodal fusion

Wan2.1 T2V 14B FusionX GGUF

This is a quantized text-to-video model that converts the base model to the GGUF format and can be used in ComfyUI, providing more options for text-to-video generation.

Text-to-Video English

Wan2.1 14B T2V FusionX FP8 GGUF

This is a GGUF conversion version based on the vrgamedevgirl84/Wan14BT2VFusionX model, mainly used for text-to-video generation tasks.

Lilt Infoxlm Base

LiLT-InfoXLM is a language-agnostic layout transformer model, created by combining the pre-trained InfoXLM with a language-independent layout transformer (LiLT), suitable for structured document understanding tasks.

Multimodal Fusion

Macbert Ngram Miao

A large language model based on Transformer architecture, supporting various natural language processing tasks

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase